Comprehending Real Numbers: Development of Bengali Real Number Speech Corpus
نویسندگان
چکیده
Speech recognition has received a less attention in Bengali literature due to the lack of a comprehensive dataset. In this paper, we describe the development process of the first comprehensive Bengali speech dataset on real numbers. It comprehends all the possible words that may arise in uttering any Bengali real number. The corpus has ten speakers from the different regions of Bengali native people. It comprises of more than two thousands of speech samples in a total duration of closed to four hours. We also provide a deep analysis of our corpus, highlight some of the notable features of it, and finally evaluate the performances of two of the notable Bengali speech recognizers on it.
منابع مشابه
Web-Based Bengali News Corpus for Lexicon Development and POS Tagging
Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing (NLP) applications. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. We have used a Bengali news corpus, developed from the web archive of a widely read Bengali newspaper. The ...
متن کاملLexicon Development and POS Tagging Using a Tagged Bengali News Corpus
Lexicon development and Part of Speech (POS) tagging are very important for almost all Natural Language Processing(NLP) application areas. The rapid development of these resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper...
متن کاملSome properties of fuzzy real numbers
In the mathematical analysis, there are some theorems and definitions that established for both real and fuzzy numbers. In this study, we try to prove Bernoulli's inequality in fuzzy real numbers with some of its applications. Also, we prove two other theorems in fuzzy real numbers which are proved before, for real numbers.
متن کاملConnected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers
This paper describes a series of experiments that compare different approaches to training a speakerindependent continuous-speech digit recognizer using the CSLU Toolkit. Comparisons are made between the Hidden Markov Model (HMM) and Neural Network (NN) approaches. In addition, a description of the CSLU Toolkit research environment is given. The CSLU Toolkit is a research and development softwa...
متن کاملA Hybrid Model for Part-of-Speech Tagging and its Application to Bengali
— This paper describes our work on Bengali Part of Speech (POS) tagging using a corpus-based approach. There are several approaches for part of speech tagging. This paper deals with a model that uses a combination of supervised and unsupervised learning using a Hidden Markov Model (HMM). We make use of small tagged corpus and a large untagged corpus. We also make use of Morphological Analyzer. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018